Clustering Words by Projection Entropy

نویسندگان

  • Isik Baris Fidaner
  • Ali Taylan Cemgil
چکیده

We apply entropy agglomeration (EA), a recently introduced algorithm, to cluster the words of a literary text. EA is a greedy agglomerative procedure that minimizes projection entropy (PE), a function that can quantify the segmentedness of an element set. To apply it, the text is reduced to a feature allocation, a combinatorial object to represent the word occurences in the text’s paragraphs. The experiment results demonstrate that EA, despite its reduction and simplicity, is useful in capturing significant relationships among the words in the text. This procedure was implemented in Python and published as a free software: REBUS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Distributional Similarity, Phase Transitions and Hierarchical Clustering

We describe a method for automatically clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy is used to measure the dissimilarity of those distributions. Clusters are represented by "typical" context distributions averaged from the given words accordi...

متن کامل

Fuzzy Entropy Based Fuzzy c-Means Clustering with Deterministic and Simulated Annealing Methods

This article explains how to apply the deterministic annealing (DA) and simulated annealing (SA) methods to fuzzy entropy based fuzzy c-means clustering. By regularizing the fuzzy c-means method with fuzzy entropy, a membership function similar to the Fermi-Dirac distribution function, well known in statistical mechanics, is obtained, and, while optimizing its parameters by SA, the minimum of t...

متن کامل

A Probabilistic Clustering-Projection Model for Discrete Data

For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a low-dimensional latent space for words, and clustering aims at grouping documents based on their feature representations. In general projection and clustering are studied independently, but they both represent the intri...

متن کامل

Hybrid Syntactic Category Induction

Much research has been devoted to the task of learning lexical classes from unannotated input text. Among the chief difficulties facing any approach to the unsupervised induction of lexical classes are that of token-level ambiguity and the classification of rare and unknown words. Following the work of previous authors, the initial stage of syntactic category induction is treated in the current...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.6830  شماره 

صفحات  -

تاریخ انتشار 2014